Learning Nearest-Neighbor Quantizers from Labeled Data by Information Loss Minimization

نویسندگان

  • Svetlana Lazebnik
  • Maxim Raginsky
چکیده

This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y . We derive an alternating minimization procedure for learning a finite set of prototypes that partition the feature space via the nearest-neighbor encoding rule with respect to the Euclidean distance, while simultaneously partitioning the simplex of class distributions by the nearest-neighbor rule with respect to the Kullback-Leibler divergence. The resulting quantizer can be used to simultaneously encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of universal lossless coding. The promise of our method is demonstrated for the application of learning discriminative visual vocabularies for bag-of-features image classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unla...

متن کامل

Supervised Manifold Learning with Incremental Stochastic Embeddings

In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction ...

متن کامل

Non-zero probability of nearest neighbor searching

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...

متن کامل

Anew Active Learning Techniqueusing Furthestnearestneighbour Criterion for K-nnand Svmclassifiers

Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approac...

متن کامل

Conscientiousness Measurement from Weibo's Public Information

We apply a graph-based semi-supervised learning algorithm to identify the conscientiousness of Weibo users. Given a set of Weibo users’ public information(e.g., number of followers) and a few labeled Weibo users, the task is to predict conscientiousness assessment for numeric unlabeled Weibo users. Singular value decomposition(SVD) technique is taken for feature reduction, and K nearest neighbo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007